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Abstract. The paper studies randomness extraction from sources with 
bounded independence and the issue of independence amplification of 
sources, using the framework of Kolmogorov complexity. The dependency 
of strings x and y is dep(x,y) = max{C(x) — C(x \ y),C(y) — C(y \ x)}, 
where C(-) denotes the Kolmogorov complexity. It is shown that there 
exists a computable Kolmogorov extractor / such that, for any two n-bit 
strings with complexity s(n) and dependency a(n), it outputs a string 
of length s(n) with complexity s(n) — a(n) conditioned by any one of 
the input strings. It is proven that the above are the optimal param- 
eters a Kolmogorov extractor can achieve. It is shown that indepen- 
dence amplification cannot be effectively realized. Specifically, if (after 
excluding a trivial case) there exist computable functions /i and fa such 
that dep(/i(a:, y), ^i(x, y)) < /3(n) for all n-bit strings x and y with 
dep(x,y) < a(n), then /3(n) > a(n) — O(logn). 

Keywords: Kolmogorov complexity, random strings, independent strings, 
randomness extraction. 



1 Introduction 

Randomness extraction is an algorithmical process that improves the quality of 
a source of randomness. A source of randomness can be modeled as a finite prob- 
ability distribution, or a finite binary string, or an infinite binary sequence and 
the randomness quality is measured, respectively, by min-entropy, Kolmogorov 
complexity, and constructive Hausdorff dimension. All the three settings have 
been studied (the first one quite extensively). 

It is desirable to have an extractor that can handle very general classes of 
sources. Ideally, we would like to have an extractor that obtains random bits 
from a single defective source under the single assumption that there exists a 
certain amount of randomness in the source. Unfortunately, this is not possible. 
In the case of finite distributions, impossibility results for extraction from a 
single source have been established by Santha and Vazirani [H] and Chor and 
Goldreich 6 . In the case of finite binary strings and Kolmogorov complexity 
randomness, Vereshchagin and Vyugin [35] show that there exists strings x with 
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relatively high Kolmogorov complexity so that any string shorter than a: by a 
certain amount and which has small Kolmogorov complexity conditioned by x 
(in particular any such shorter string effectively constructed from x) has small 
Kolmogorov complexity unconditionally. The issue of extraction from one infinite 
sequence has been first raised by Reimann and Terwijn [18J . and after a series 
of partial results [1811413] . Miller [T3] has given a strong negative answer, by 
constructing a sequence x with dim(a:) = 1/2 such that, for any Turing reduction 
/, dim(/(x)) < 1/2 (or f(x) does not exist; dim(x) is the constructive Hausdorff 
dimension of the sequence x). 

Therefore, for extraction from a general class of sources, one has to consider 
the case of t > 2 sources, and in this situation, positive results are possible. 
Computable extractors from t = 2 distributions with min-entropy k = O(logn) 
are constructed in [618] . The construction of polynomial-time multisource ex- 
tractors is a difficult problem. Currently, for t = 2, the best results are by 
Bourgain [4 who achieves k = (1/2 — a)n for a small constant a, and Raz [17] 
who achieves k = polylogn for one distribution and k = (1/2 + a)n for the other 
one. Polynomial-time extractors for 3 or more distributions with lower values 
of k for all distributions are constructed in [11211711(3 15 . Dodis et al. [7] con- 
struct a polynomial-time 2-source extractor for k > n/2, where the extracted 
bits are random conditioned by one of the sources. Kolmogorov extractors for 
t > 2 sources also exist. Fortnow et al. [10] actually observe that any randomness 
extractor for distributions is a Kolmogorov extractor and Hitchcock et al. [TT] 
show that a weaker converse holds, in the sense that any Kolmogorov extrac- 
tor is a randomness condenser with very good parameters ("almost extractor"). 
For t = 2, the works 23 25 construct computable Kolmogorov extractors with 
better properties than those achievable by converting the randomness extractors 
from [6] and [8]. The case of infinite sequences is studied in [24] . which shows 
that it is possible to effectively increase the constructive dimension if the input 
consists of two sources. 

All the positive results cited above require that the sources are independent. 
At a first glance, without independence, even the distinction between one and 
two (or more) sources is not clear. However, independence can be quantified 
and then we can consider two sources having bounded independence. It then be- 
comes important to determine to what extent randomness extraction is possible 
from sources with a limited degree of independence and whether the degree of 
independence can be amplified. 

We address these questions for the case of finite strings and Kolmogorov 
complexity-based randomness. The level of dependency of two strings is based on 
the notion of mutual information. The information that string x has about string 
y is I(x : y) — C(y) — C(y | x), where C(y) is the Kolmogorov complexity of y and 
C(y | x) is the Kolmogorov complexity of y conditioned by x. By the symmetry 
of information theorem, I(x : y) « I(y : x) « C(x) + C(y) — C(xy)\j We define 
the dependency of strings x and y as dep(x,y) = max{/(i : y),I(y : x)}. Let 

1 We use «, X and for equalities and inequalities that hold within an additive error 
bounded by O(logn). 
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Sk, a be the set of all pairs of strings (x,y) such that C(x) > k, C(y) > k and 
dep(a;, y) < a. A Kolmogorov extractor for the class of sources Sk, a is a function 
/ : {0,1}™ x {0,1}™ -> {0, l} m such that for all (x,y) G S kt a, C(f(x,y)) is 
"close" to m. In other words, if we define the randomness deficiency of a string 
z as \z\ — C{z), we would like that the randomness deficiency of f(x,y) is small. 
Our first result shows that the randomness deficiency of f(x, y) cannot be smaller 
than essentially the dependency of x and y. 

Result 1 (informal statement; see full statement in Theorem[2j). There exists 
no computable function / with the property that, for all (x,y) € Sk, a , the 
randomness deficiency of f(x,y) is less than a — log n — O(loga). This holds 
true even for high values of k such as k >z n — a. The only condition is that 
m > a (m is the length of the ouput of /). 

We observe that the similar result holds for the case of finite distribu- 
tions. Let Sk, a be the set of all random variables over {0,1}™ that have min- 
entropy at least k and dependency at most a. (The min-entropy of X is 
Hoo(X) = min aG { Q ^i. ,x(a)>o log(l/ProbLY = a]) and the dependency of X 
and Y is H oc (X) + H^Y) — H QO (X,Y).) Then, for every a and m > a and 
for every function / : {0, 1}™ x {0, 1}™ — > {0, l} m (even non-computable), there 
exists {X, Y) 6 Sfc a with dependency at most a and min-entropy of f(X, Y) at 
most m — a. 

Our next result (and the main technical contribution of this paper) is a 
positive one. Keeping in mind Result 1, the best one can hope for is a Kolmogorov 
extractor that from any strings x and y having dependency at most a obtains a 
string z whose randomness deficiency is ~ a. We show that this is possible in a 
strong sense. 

Result 2 (informal statement; see full statement in Theorem |4]). For every 
k > a, there exists a computable function / : {0, 1}™ x {0, 1}™ — > {0, 1}"\ where 
m w k, and such that for every (x, y) 6 4 a, C(f(x, y) \ x) = m — a — 0(1) and 
C(f(x,y)\y) = m-a-0(l). 

Thus, optimal Kolmogorov extraction from sources with bounded indepen- 
dence can be achieved effectively and in a strong form. Namely, the randomness 
deficiency of the extracted string z is minimal (i.e., within an additive constant 
of a) even conditioned by any one of the input strings and furthermore the length 
of z is maximal. In [53] a similar but weaker theorem has been established. The 
difference is that in [33] the length of the output is only « k/2 and k has to 
be at least 2a. The proof method of Result 2 extends the one used in [53] in a 
non-trivial way (the novel technical ideas are described in Section ETTj) . We note 
that the Kolmogorov extractor that can be obtained from the randomness ex- 
tractor from [8] using the technique in [10] would have weaker parameters (more 
precisely, the output length would be m w k — 2a). 

The dependency of two strings x and y is another measure of the non- 
randomness in (a;, y) considered as a joint source. Similarly to Kolmogorov ex- 
tractors that reduce randomness deficiency, it would be desirable to have an 
algorithm that reduces dependency (equivalently, amplifies independence). The 
main result of the paper shows that effective independence amplification is essen- 
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tially impossible. We say that two functions fi, / 2 : {0, 1}™ x {0, 1}" -> {0, 
amplify independence from level a(n) to level (3(n) (for f3(n) < a(n)) if 
dep(/i(x, y), f2(x, y)) < (3(n) whenever dep(x, y) < a(n). Note that this is triv- 
ial to achieve if fi(x,y) or f2(x,y) have Kolmogorov complexity at most /3(n). 
Therefore, we also request that fi (x, y) and f% (x, y) have Kolmogorov complexity 
at least /3(n) + clogn, for some constant c. However, as a consequence of Result 
1 and Result 2, this is impossible for any reasonable choice of parameters. 

Result 3 (informal statement; see full statement in Theorem[5]). Let /i and fi 
be computable functions such that for all (x,y) € Sk, a , dep(/i(x, y), fi{x, y)) < 
p{n) (and C(h{x,y)) t P(n),C(f 2 (x,y)) h P{n)). Then p(n) t o(n). This 
holds true for any a(n) ^ n/2 and any k <n — a(n). 

Discussion of some technical aspects. As it is typically the case in proba- 
bilistic analysis, handling sources with bounded independence is difficult. In this 
discussion, an (n, fc) source is a random variable over {0, 1}" with min-entropy 
k. Chor and Goldreich [5] show that a random function starting from any two 
independent sources of type (n, k) extracts sa fc/3 bits that are close to random. 
Dodis and Oliveira [8] using a more refined probabilistic analysis (based on a 
martingale construction) show the existence of an extractor that from two inde- 
pendent sources X and Y of type (n,k\) and respectively {n,k2) obtains ~ k\ 
bits that are close to random even conditioned by Y. Both constructions use in 
an essential way the independence of the two input distributions. The indepen- 
dence property allows one to reduce the analysis to the simpler case in which the 
two input distributions are so called flat distributions. A flat distribution with 
min-entropy k assigns equal probability mass to a subset of size 2 fc of {0, 1}™ and 
probability zero to the elements outside this set. Extractors that extract from 
flat distributions admit a nice combinatorial description. Namely, an extractor 
E : {0, 1}" x {0, 1}" -> {0, 1}'" for two flat distributions X, Y with min-entropy 
k corresponds to an N-by-N table (where N — 2") whose cells are colored with 
M colors (where M = 2 TO ) that satisfy the following balancing property: For 
any set of colors A C [M] and for any K-by-K subrectangle of the table (where 
K = 2 ), the number of A-colored cells is close to |^4|/M. Such tables can be 
obtained with the probabilistic method. 

If the two input distributions are not independent, then the reduction to flat 
distributions is not known to be possible and the above approach fails. This is 
why almost all of the currently known randomness extractors (whether running 
in polynomial time, or merely computable) assume that the weak sources are 
perfectly independent (one exception is the paper [2"T]). 

In this light, it is surprising that Kolmogorov extractors for input strings that 
are not fully independent (actually with arbitrarily large level of dependency) 
can be obtained via balanced tables, as we do in this paper. This approach 
succeds because the Kolmogorov complexity-based analysis views the level of 
independence of sources as just another parameter and there is no need for 
any additional machinery to handle sources that are not fully independent. We 
believe (based on some partial results) that Kolmogorov complexity is a useful 
tool not only for analyzing Kolmogorov extractors but also for circumventing 
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some of the technical difficulties in the investigation of multi-source extractors 
for sources with bounded independence. 

2 Preliminaries 

We work over the binary alphabet {0, 1}; N is the set of natural numbers. A 
string x is an element of {0, 1}*; |x| denotes its length; {0, 1}™ denotes the set of 
strings of length n; |^4| denotes the cardinality of a finite set A; for n G N, [n] de- 
notes the set {1,2, ... ,n}. We recall the basics of (plain) Kolmogorov complexity 
(for an extensive coverage, the reader should consult one of the monographs by 
Calude [5], Li and Vitanyi [15] , or Downey and Hirschfeldt [5]; for a good and 
concise introduction, see Shen's lecture notes [20]). Let M be a standard Turing 
machine. For any string x, define the (plain) Kolmogorov complexity of x with 
respect to M, as Cm(x) — min{|p| | M(p) = x}. There is a universal Turing 
machine U such that for every machine M there is a constant c such that for 
all x, Cjj(x) < Cm(x) + c. We fix such a universal machine U and dropping the 
subscript, we let C(x) denote the Kolmogorov complexity of x with respect to U. 
We also use the concept of conditional Kolmogorov complexity. Here the under- 
lying machine is a Turing machine that in addition to the read/work tape which 
in the initial state contains the input p, has a second tape containing initially 
a string y, which is called the conditioning information. Given such a machine 
M, we define the Kolmogorov complexity of x conditioned by y with respect to 
M as Cm{x I y) — min{|p | M(p,y) = x}. There exist universal machines of 
this type and they satisfy the relation similar to the above, but for conditional 
complexity. We fix such a universal machine U, and dropping the subscript U, 
we let C(x | y) denote the Kolmogorov complexity of x conditioned by y with 
respect to U. 

There exists a constant cu such that for all strings x, C{x) < \x\+cu- Strings 
x\,X2, ■ ■ ■ ,Xk can be encoded in a self-delimiting way (i.e., an encoding from 
which each string can be retrieved) using \x\ \ + \x2\ + ■ ■ ■ + \xk\ + 2 log \x% | + . . .+ 
2 log \xk\+0(k) bits. For example, x\ and x-i can be encoded as (bin(\xi\)§lx\X2, 
where bin(n) is the binary encoding of the natural number n and, for a string 
u = ui...u rn , u is the string uiu\...u m u m (i.e., the string u with its bits 
doubled) . 

For every sufficiently large n and k < n, for every n-bit string y, 2 fc_21o s™ < 
\{x e {0,1}" | C(x I y) < k}\ < 2 k+1 . 

The Symmetry of Information Theorem |26j states that for any two strings 
x and y, 

(a) C{xy) < C{y) +C(x\y) + 2 log C(y) + 0(1). 

(b) C(xy) > C(x)+C(y \ x) - 2 log C(xy) - AloglogC (xy) - 0(1). 

(c) If \x\ = \y\ = n, C(y) - C(y | x) > C{x) - C(x \ y) - 51ogn 

For integers m < n, let b(n, m) = ( r Q l ) + (") + . . . + ( ™J . Note that m(log n - 
log to) < log b(n, to) < m(logn — logm) + m log e + log(l + to) (since (n/m) m < 
(«) < (en/mD. 



5 



All the Kolmogorov extractors will be ensembles of functions / = (f n )neN, 
of type /„ : ({0,1}")* -> {0, l} m ("). The parameter t is constant and indi- 
cates the number of sources (in this paper we only consider t = 1 and t = 2). 
For readability, we usually drop the subscript and the expression "function 
/ : {0, 1}™ — > {0, l} m ..." is a substitute for "ensemble / = (f n ) n eN, where 
/„:{0,1}"^{0,1}™("\ ..." 

We say that an ensemble of functions / = (/„) is computable with advice 
k(n), if for every n there exists a string p of length at most k(n) such that 
U(p, 1") outputs the table of the function /„. 

We use the following standard version of the Chernoff bounds. Let X\, . . . , X n 
be independent random variables that take the values and 1, let X = ^Xi 
and let \x be the expected value of X. Then, for any < d < 1, ProbLY > 
(l + d)fj] < e - d ^/ 3 . 

2.1 Limited Independence 

Definition 1. (a) The dependency of two strings x and y is dep(x,y) = 
max{C(x) - C(x \ y),C{y) - C(y \ x)}. 

(b) Let d : N — > N. We say that strings x and y have dependency at most d(n) 
ifdep(x,y) < d(max(|x|, \y\)). 

The Symmetry of Information Theorem implies that 

|dep(x, y) - (C(x) - C(x \ y))\ < 0(\og(C(x)) + log(C(y))). 

If the strings x and y have length n, then 

\dep(x,y) - (C(x) - C(x y))\ < 51ogn. 

3 Limits on Kolmogorov complexity extraction 
3.1 Limits on extraction from one source 

We first show that for any single-source function computable with small advice 
there exists an input with high Kolmogorov complexity whose image has low 
Kolmogorov complexity. 

Proposition 1. Let f : {0, 1}" — > {0, l} m be a function computable with advice 
k(n). There exists x G {0, 1}" with C(x) > n — m and C(f(x)) < k(n) + \ogn + 
21oglogn + 0(l). 

Proof. Let z be the most popular element in the image of / (i.e., the element in 
{0, l} m with the largest number of preimages under /; if there is a tie, take z 
to be the smallest lexicographically) . Since z can be described by the table of / 
and O(l) bits, it follows that C(z) < k(n) + log n + 2 log logra + O(l). There are 
at least 2 n ~ m elements of {0, 1}" mapping to z. Thus, there must be a string x 
of complexity at least n — m mapping to z. j| 



6 



The following result is, in a sense, a strengthening of the previous proposition. 
It shows that there exists a string with relatively high Kolmogorov complexity, 
so that all functions computable with a given amount of advice fail to extract its 
randomness. We provide two incomparable combinations of parameters. Part (b) 
is essentially a result of Vereshchagin and Vyugin 22. . 

Theorem 1. For every k, every n, any computable function m: 

(a) There exists a string x € {0,1}™ such that for every function f : 
{0, 1}™ — > {0, l} m that is computable with advice k — k{n), 

(1) C{x) > n-logb(M,K) > n-K{m-k+0(l)), where M = 2 m ,K = 2 fc+1 -l 7 
and 

(2) C{f{x)) <2k + 2 log k + log n + 2 log log n + 0(1) or f(x) is not defined, 
and 

(b) There exists a string x € {0, 1}™ such that for every function f : {0, 1}™ — > 
{0, l} m that is computable with advice k, 

(1) C{x) > n - K\og(M + 1) » n - Km, where M = 2 m , K = 2 k+1 - 1, and 

(2) C(f(x)) < k + logn + 2 log log n + 0(1) or f(x) is not defined. 

Proof. Let /j, i G {1, . . . ,K} be the function computed by U(pi, 1"), where pi 
is the z-th string in {0, l}- k . We fix n and let m = m(n). 

For each a; e {0,1}™, consider the computations f%(x), f2{x), . . . , fi((x). 
Some of them may not halt, and some of them may produce strings of length 
different from m. Let Range(x) be the set of strings of length m that result from 
these computations. 

We first prove (a). Range(x) has one of b(M,K) possible values. It follows 
that there exists one set that is equal to Range(a;) for at least 2 n /b(M, K) many 
strings x 6 {0, l} n . We say that such a set is frequent. Consider all frequent sets 
and let s be the maximum size of a frequent set, taken over all frequent sets. If 
we know s, we can enumerate all frequent sets of size s. Let {zi, . . . , z s }, be the 
first such set that appears in the enumeration. Note that each entry Zi can be 
described by s, n, fc, and i < s. We can represent i by a string having length 
exactly k + 1 bits and this string will therefore also describe k. It follows that 
each such Zi satisfies 

C(z t ) < k + log 7i + logs + 2(loglogn + log logs) +0(1) 
<2k + logn + 21oglogn + 21ogfc + 0(1), 

where we have used the fact that i < K and s < K. The set {zi, ...,£&} is equal 
to at least 2 n /b(M,K) Ranges. So there exists x with C(x) >n- log b(M, K) 
such that Range(x) = (zi,...,z s ). This x satisfies the requierements in the 
statement. | 

We now prove (b) (following [H]). The goal, as before, is to produce a set 
that is equal to Range(a;), for many x £ {0, 1}™. We can do this, avoiding the 
information s used in the previous proof, by the following greedy algorithm. By 
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dovetailing the computations fi(x), for all x G {0,1}™ and i G [K], we start 
enumerating strings produced by these computations, of which we retain only 
those having length m. Let T = 2 m + 1. We start the enumeration till we find 
a string z\ that appears in at least 2™/T ranges. There may be no such zi and 
we handle this situation later. We mark with (1) all Ranges that have been 
identified to contain Z\. In the second iteration, we restart the enumeration till 
we find a string z 2 + 1 Z\ that belongs to at least 1/T fraction of Ranges marked 
with (1). We re-mark these Ranges with (2). In general, at iteration i, we find 
a string z,, different from zi, . . . , Zj_i, that belongs to at least a fraction 1/T of 
Ranges marked (i — 1). If we find such a z, ; we mark the Ranges that have been 
discovered to contain it with (i). 

We keep on doing this process till either (a) we have completed K iterations 
and have obtained K distinct strings z\,...,Zk in {0, l} m , or (b) at iteration i, 
the enumeration failed to produce Zi. 

In case (a), the set {zi, . . . , zk} is equal to at least 2 n /T K Ranges. 

In case (b), the set {zi, . . . , Zi_i} is a subset of at least 2™/T l ~ 1 Ranges, and 
for each other string z G {0, l} m , the set {zi, . . . , Zj_i, z} is a subset of less than 
2 n /T l Ranges. It follows that there exist at least 2 n /T i ~ 1 - 2 m ■ 2™/T 4 = 2 n /T l 
Ranges that are equal to the set {z 1; . . . , Zj_i}. 

To conclude, there exists a set {zi,...,z s }, with s < K, that is equal to 
Range(x) for at least 2"/(2 m + 1) K strings x 6 {0, 1}™. Therefore there exists 
such a string x with C(x) > n — K\og(2 m + 1). Each element z, is described by 
i < K, n and k. We represent i on exactly k + 1 bits and this also describes k. 
Therefore C(zi) < k + logn + 2 log log n + 0(1). The conclusion follows. 1 



3.2 Limits on extraction from two sources 

The following theorem shows that there is no uniform function that from two 
sources x and y that are a-dependent (i.e., dep(x,y) y a), produces an output 
whose randomness deficiency is less than a — logn — O(loga). 

Theorem 2. Let f : {0, 1}" x {0, 1}™ -> {0, l}" 1 be a computable function and 
let a G N, a < m. Then there exists a pair of strings x G {0, 1}™, y G {0, 1}™ 
such that 

C{x\y) > n — a — 2 log n 
C{y | x) > n — a — 2 log n 
C(f(x,y)) < TO-a + logn + 21oga + 0(l). 

Proof. We consider first the case m = a. Let a be the most popular string in 
the image of /. Then C(a) < logn + 0(1). Since |/- 1 (a)| has at least 2 2 ™- m 
elements, there exists strings x and y in {0,1}™ such that (x, y) G / _1 (a) and 
C{xy) > 2n-m. Since C{xy) < C(x \ y) + C(y \ x) + 21ogn and C(x) < n+0(l) 
and C(y) < n + O(l), it follows that C(x \ x) > n — m — 2 logn and C(y \ x) > 
n — m — 2 log n. Also C(f(x,y)) = C(a) < logn + 0(1). 

If m > a, take g(x,y) the prefix of length a of f(x,y). Then C(f(x,y)) < 
C(g(xy)) + (m - a) + 2 log a + O(l) < log n + (m - a) + 2 log a + 0(1), and the 
conclusion follows. % 
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The following is the analog of Theorem [5] for distributions. 

Theorem 3. Let / : {0, 1}™ x {0, 1}™ -)■ {0, l} m be a function and let a G N 7 
a < m. Then there exists two random variables X and Y taking values in {0, 1}™ 
such that 

Hoo(X) >n-a 
HooiY) >n-a 
Hoo(X,Y) >2n-a 
HooifiX^)) <m-a 

Proof. Suppose first that m — a. Let a be the most popular string in the image 
of /. Then |/ -1 ( a )l > 2 2n ~ m . Take (arbitrarily) B C f- l (a) with \B\ = 2 2n ~ m . 
Consider LEFT-B the multiset of n-bit prefixes of strings in B and RIGHT-B 
the multiset of n-bit suffixes of strings in B. The multiplicity of a string x in 
LEFT-B is equal to the number of strings in B that have x as their left half. 
Thus each string in LEFT-B has multiplicity at most 2". Counting multiplicities 
LEFT-B has 2 2n ~ m elements. Therefore LEFT-B has at least 2 n ~ m distinct 
strings. The same holds for RIGHT-B. We take X to be the random variable 
obtained by choosing uniformly at random one element in the multiset LEFT-B 
and Y is the random variable obtained by choosing uniformly at random one 
element in the multiset RIGHT-B. By the above discussion for each x € {0, 1}™ 
and y G {0, 1}™, 

ProbLY = x] < 22!^ = 2^r, 

Prob[F = y] <^L^ = ¥ ±-, 

ProbLY = x,Y = y] < 

Thus, X and Y satisfy the requirements, and Prob[/(X, Y) = a] = 1. 

Suppose now that m > a. We define g,h : {0,1}™ x {0,1}" {0,1}" by 
g{x,y) = prefix of length a of f(x,y) and h(x,y) = suffix of length m — a of 
f(x,y). Let a £ {0, 1}" and the random variables X and Y defined as in the 
first part of the proof (i.e., the case m = a) but with g replacing /. Note that 
Prob\g(X,Y) = a] = 1. Let b be a string in {0, l} m ~ a such that has at 

least 2 2n /2 m ~ a elements. Then 

Prob[/(X, Y) = ab] = Prob[#(X, Y) = a, h(X, Y) = b] 
= Prob[h(X, Y) = b] 

\. 2 2?1 /2 m a 2 — (m — a) 

This concludes the proof. | 
4 Kolmogorov complexity extraction 

We construct a Kolmogorov extractor that on input two n-bit strings with Kol- 
mogorov complexity at least s{n) and dependency at most a(n) outputs a string 
of length ?s s(n) having complexity w s(n) — a(n) conditioned by any one of the 
input strings. 
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4.1 Proof overview 

For an easier orientation in the proof, we describe the main ideas of the method. 
We also explain the non-trivial way in which the new construction extends the 
technique from the earlier works [53] and [25]. For readability, some details are 
omitted and some estimations are slightly imprecise. Let us fix, for the entire 
discussion, x and y, two n-bit strings with C(x) > s(n) and C{y) > s(n) and 
having dependency at most a(n). We denote N = 2™,M = 2 m and S = 2 s ^ n \ 
Let B x = {u C{u) < C(x)} and B y = {v \ C(v) < C{y)}. An N-by-N table 
colored with M colors is a function T : [N] x [N] —> [M]. If we randomly color 
such a table T, with parameter m ^ 2s(n), then, with high probability, no color 
appears in the B x x B y rectangle more than 2 • (1/M) fraction of times (we say 
that a table that satisfies the above balancing property is balanced in B x x B y ). 
Clearly (x, y) G B x x B y and in a table T balanced in B x x 73 y there are at 
most 2 -(1/M)- 1 B x | x « 2 • (1/M)2 c ( a; )2 c ^) = 2 c ( 3; )+ c (?')- m + 1 entries with 
the color z = T(x,y). Therefore (x,y) is described by the color z = T(x,y), 
the rank r of the (x,y) cell in the list of all z-colored cells in B x x B y , by the 
table T, and by 0(log?i) additional bits necessary to enumerate the list. Thus, 
C{xy) < C(z) + logr + C(table T) + O(logn). By the above estimation, logr w 
C(x) + C(y) — m. Also C(xy) > C(x) + C{y) — dep(x,y). Suppose that we are 
able to get a balanced table T with C(table T) = O(logn), i.e., a table that can 
be described with O(logn) bits. Then we would get that C(T(x,y)) = C(z) > 
m — dep(x,y), which is our goal. How can we obtain C(table T) = O(logn)? 
The normal approach would be to enumerate all possible TV-by-TV tables with all 
possible colorings with M colors and pick the first one that satisfies the balancing 
property. However, since B x and B y are only computably enumerable, we can 
never be sure that a given table has the balancing property. Therefore, instead 
of restricting to only B x and B y , we require that a table T should satisfy the 
balancing property for all rectangles B\ x B 2 with sizes |i?i| > S and \B 2 \ > S, 
where S = 2 s ( n K The simple probabilistic analysis involves only an additional 
union bound and carries over showing that such balanced tables exist at the cost 
that this time we need m < s(n). Now we can pick in an effective way the smallest 
(in some canonical order) table T having the balancing property, because we can 
check the balancing property in an exhaustive manner (look at all S x S'-sized 
rectangles, etc.). Therefore this table T can be described with logn + 0(l) bits, 
as desired. In this way, from any x and y, each having Kolmogorov complexity at 
least s(n), we obtain m s» s(n) bits having Kolmogorov complexity m—dep(x, y). 
We reobtain m ~ 2s(n) if we change the balancing property and require that 
for any subset of colors A C [M] of size M/D, for D w 2 Q< ™', for any rectangle 
Bi x £?2 with sizes |£?i| > S and \B 2 \ > S, the fraction of ^4-colored cells in 
Bi x B 2 should be at most 2 • (|A|/M) = 2 • (1/D). Such a table can be obtained 
with m « 2s(n), and thus we can extract rs 2s(n) bits having Kolmogorov 
complexity « 2s(n) — dep(x,y), which is optimal. 

Let us consider next the problem of extracting bits that are random even 
conditioned by x, and also conditioned by y. Suppose we use tables that satisfy 
the first balancing property. We focus on B x = {u \ C(u) < C(x)} and we call 
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a column v bad for a color a £ [M] if the fraction of a-colored cells in the strip 
B x x {v} of the table T is more than 2 • (1/M). The number of bad columns 
is less than S; otherwise the table would have an S x S-sized rectangle that 
does not have the balancing property. Note that a bad column for a color a 
can be described by the color a and its rank in an enumeration of the columns 
that are bad for a plus additional O(logn) bits. So if v is a bad column, then 
C(v) d: m + s(n) pa 2s(n). Therefore if C(y) h 2s(n), y is good for any color. 
An adaptation of the above argument shows that for z — T(x,y) it holds that 
C(x \ y) < C(z \ y) + C(x) + C(y) — m, which combined with C(x \ y) > 
C(x) + C(y) — dep(x, y), implies C(z | y) >r m — dep(a;, y). The above holds only 
for y with C(y) > 2s(n) and since the probabilistic analysis requires m to be 
less than s(n), it follows that the number of extracted bits (which is m) is less 
than half the Kolmogorov complexity of y. 

The above technique was used in [23j and in |25j . To increase the number of 
extracted bits, we introduce a new balancing property, which we dub rainbow 
balancing. Fix some parameter D, which eventually will be taken such that 
\ogD pa dep(x,y). Let Ad be the collection of sets of colors A C [AI] with size 
\A\ pa M/D. Let B\ C [N] be a set of size a multiple of S, let v = {vi < vi . . . < 
vs} be a set of S columns, and let A = (Ai, . . . , As) be a tuple with each Ai in 
Ad- We say that a cell (u,Vi) such that T(u,Vi) £ Ai is properly colored with 
respect to v and A. Finally we say that a table T : [N] x [N] -)• [M] is (S, D)- 
rainbow balanced if for every B\, every v, and every A, the fraction of cells in 
Bi xv that are properly colored with respect to v and A is at most 2 ■ (1/D). 
The probabilistic method shows that such tables exist provided m ^ s(n) and 
log-D ^ s(n). Since the rainbow balancing property can be effectively checked, 
there is an (S, D)-rainbow balanced table T : [N] x [N] -i- [M] that can be 
described with logrt + 0(l) bits and m « s(n) and \ogD pa s(n). Let z — T(x,y) 
and suppose that C(z \ y) < m — t, where t = a(n) — clogrt, for some constant 
c that will be defined later (in the actual proof we do a tighter analysis and we 
manage to take t = a(n) — 0(1)). For each v, let A v = {w £ [M] \ C(w | v) < 
m — t}. For log D pa a(n) + clogm, it holds that A v £ Ad for all v. Let us call a 
column v bad if the fraction of cells in B x x {v} that are A„-colored is larger than 
2 • (1/2*) . Analogously to our earlier discussion, the number of bad columns is less 
than S and from here we infer that if v is a bad column, then C{v) -< s(n). Since 
C(y) > s(n), it follows that y is a good column. Therefore the fraction of cells 
in the B x x {y} strip of the table T that have a color in A y is at most 2 • (1/2*). 
Since (x, y) is one of these cells, it follows that, given y, x can be described by the 
rank r of (x, y) in an enumeration of the ^-colored cells in the strip B x x {y}, 
a description of the table T, and by O(logro) additional bits necessary for doing 
the enumeration. Note that there are at most 2 • (1/2*) -\B X \ pa 2~ t+1 • 2 c(:r) cells 
in B x x {y} that are Aj,-colored and, therefore, logr < C(x) — t+l. From here we 
obtain that C{x | y) < C(x)-t + l + 0(logn) = C (x) - a(n) - clog n + O (log n). 
Since C(x | y) > C(x) — a(n), we obtain a contradiction for an appropriate 
choice of the constant c. Consequently C(z \ y) > m — t = m — a(n) + clogn. 
Similarly, C(z \ x) > m — a(n) + clogn. Thus we have extracted m ~ s(n) 
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bits that have Kolmogorov complexity « m — a(n) conditioned by x and also 
conditioned by y. 



4.2 Construction of the Kolmogorov extractor 

For n and m natural numbers, let N = 2 n and M = 2 m . Henceforth, we identify 
{0, 1}™ with [AT] and {0, l} m with [M]. 

We consider functions of the form T : [N] x [N] — > [M], which we view 
as N-by-N tables whose cells are colored with colors in [M]. Let S and D be 
parameters with S < N and D < M. 

Let A D = {A\AC [M], {M/D) < \A\ < (M/D)m 2 }. Thus, the elements of 
Ad are those sets of colors having at least M/D colors and not much more than 
that. 

Let £?2 Q [N] be a subset of size S; we name its elements £>2 = {v% < 
V2 < ■ ■ ■ < vs}. We view B2 as a set of columns in the table. Let (Ai, . . . , As) S 
(A D ) S - The cell (u,v i) G [N] x B2 is properly colored with respect to the columns 
in B2 and (A\, . . . , As) if T(u, Uj) € Aj. A similar notion of a cell being properly 
colored with respect to rows in a set f?i C [N] will also be used. 

Definition 2. A fa6Ze T : [N] x [AT] — s> [M] is (S, D) -rainbow balanced if 

(a) • /or aZZ B\ C [A^] 0/ size k ■ S for some positive natural number k, 

• for all B2 C [AT] of size S , 

• for all (Ai,...,A s ) e (A D ) S , 

it holds that the number of cells in B\ x B2 that are properly colored with 
respect to columns B2 and (Ai, . . . , As) is at most 

and 

(^6 J if the similar relation holds if we switch the roles of B\ and B2 ■ 

Lemma 1. If S > 12D + 3(1 + \nD)Mm 2 + 6Dhx(N/S), there exists a table 
T : [N] x [AT] ->• [M] that is (S , D) -rainbow balanced. 

Proof. We use the probabilistic method. We show that a randomly colored table 
fails with probability < 1/2 to satisfy the proper coloring property with respects 
to columns (property (a) in definition [2]). A similar calculation shows the similar 
fact about proper coloring with respects to rows (property (b) in definition [2]). 
Therefore we can conclude that a (S, £))-rainbow balanced table exists. 

Observe that it is enough to consider sets Bi of size exactly S (because a set 
of size kS can be broken into k sets of size S and if each smaller set satisfies the 
property, then the larger set will satisfy it as well). 

Therefore, let us fix B% and B2 subsets of [AT] of size S, let B\ = {u% < . . . < 
u s } and Bi = {vx < . . . < v s }. We fix (Ai, ...,A S ) £ {A D ) S ■ 
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Let Xij be the random variable which is 1 if the cell(ui,Vj) is properly 
colored with respect to columns in B 2 and (A\, . . . Ag) (i.e., T(ui,Vj) E A,-), 
and otherwise. Then 

Prob^j = 1] = J^i = H G [l/£>,m 2 /£>]. 

Let X = Ei 6 Bi,j6B 2 Thcn 

A* = = E E E 1 X *J] = E S • * e ^/A 5 2 • m 2 /I>]. 

By the Chernoff bounds, 

Prob[X > 2 M ] < e-(V3)/* < e -(i/3)(S 2 /^). 

It follows that 

Prob [X > 2^-} < Pmb[X > 2 M ] < e -(V3)(s 2 /i5)_ 

We next take the union bound over all possible choices of (A\, . . . , As) € (Ad) 5 , 
and all possible choices of B\ and B 2 subsets of [N] of size S. 

For T e [M/D, M ■ m 2 /D], the number of sets in [M] of size T is (^) < 
(eM) T = e T • e Tln (M/T) < e T+T ^ D . So the number of subsets of [M] with sizes 
between M/D and M ■ m 2 /D is at most 

M-m 2 /D 

e T(l+lnD)_ 

Denoting q = e ( 1 + ln£) ) j the above sum is 

Et-m/d e T ( 1+lnD ) = g ( M /-°) + g( M /^) + l + . . . + q (M/D)m 2 

_ „(M/C) g W^)("» 2 -D + i-l 
* 9-1 

< g(M/U) . q (M/D)rn 2 . q -(M/D) . _£_ 

< 2q( M / D )- m2 = 2 • e (l+lnr>)-(Af/L>)-m 2 ^ 

So the number of tuples (Ai,...,As) 6 (Ad) S is less than 2 s • 

e S-(l+lnD)-(M/D)-m 2 

The number of ways of choosing B\ and B 2 is 

' CO < ^~~~'* 2S = e 25+251n(Ar/5) 
For the union bound to give a probability < e _1 < 1/2 we need 

(1/3)(1/L>)S* 2 > S + 5(1 + \nD)(M/D)m 2 + 2S + 2Sln(N/S) + 1, 
which holds true if the parameters satisfy the hypothesis. jj 
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Theorem 4. For any computable functions s(n) and a{n) with n > s(n) > 
a(n) + 71ogn + 0(1), for every computable function m(n) with m(n) < s(n) — 
7\ogn, there exists a computable function E : {0,1}™ x {0,1}™ — > {0,1}™^™\ 
such that for all x and y in {0, 1}™ if 

(i) C(x)>s(n),C(y)>s(n), 

(ii) C(x) — C(x | y) < a(n) and C(y) — C(y | x) < a(ri), 
then 

(1) C{E{x, y)\x)>m- a(n) - 0(1), 

(2) C{E{x, y)\y)>m~ a(n) - 0(1). 

Proof. The construction depends on a constant C that will be determined later. 
Let s = [s(n) - 31ognJ, S = 2 s , D = 2«(»)+ c '+ 21 °e™ an d t = a{n) + C. 

By Lemma Q] there exists T : [N] x [N] —> [M] an (S, D) -rainbow balanced 
table. We consider the smallest (in some canonical order) such table T and define 
E(x, y) to be T(x, y). Thus, the table T can be described with logn + 0(1) bits. 

Let us fix x and y with C{x) —t\> s(n), C(y) =t2> s(n) and dep(x,y) < 
a(n). 

Let z = T(x,y). We prove that C(z | y) > m — a(n) — C = m — t and 
C(z | x) > m — a(n) — C — m — t. Actually we show just the first relation (the 
second one is similar). 

Suppose C{z | y) < m — t. 

Let B t = {u e {0, 1}" | C(u) < ti} and B 2 = {v 6 {0, 1}" | C(v) < t 2 }. We 
have < 2 tl+1 and \B 2 \ < 2 t2+1 . Take supersets B[ D B x and B' 2 D B 2 with 
\B[\ = 2* 1+1 and \B' 2 \ = 2 t2+1 (and B[, B' 2 C [N]). Note that the sizes of B[ 
and B' 2 are exact multiples of S. 

For each v G {0, 1}™, let A v = {w G {0, l} m | | v) < m - 1}. Note that 
2 m-t-2io gm < < 2™-* and thus M/£» < | A, | < M • m 2 /L>. In other words, 
for all v G {0,1}™, A„ G A D . 

We say that v G {0, 1}™ is a bad column if the number of cells in B\ x {v} 
that are Ay-colored is at least 2 • ^r^. 

Since i?i C B[, if u is a bad column, the number of A„-colored cells in 
B[ x {w} is also at least 2 • It follows that the number of bad columns is 
less than S. Otherwise, there would be S columns vi, . . . , vs that fail to satisfy 
(a) in Definition [5] for B[ and the tuplet of colors (A Vl , . . . , A vs ), and this is not 
possible because the table T is rainbow balanced. 

The set of bad columns can be enumerated if we are given t±, m — t and 
the table T. Therefore, if v is a bad column, then v can be described by its 
rank in the enumeration of the bad columns and by the information needed for 
the enumeration. Note that from n, we can calculate the table T and m — t. 
Therefore, 

C{v) < log(S) +log(fi) +logn + 21oglog£i +21oglogn + 0(l) 
< s + 31ogn. 
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Since C(y) > s(n) — s + 31ogn, y is a good column. 

Let G be the positions in the strip B\ x {y} that are Ay-colored. Formally, 
G = proj 1 (T _1 (Aj ( ) (~l (Bi x {y}). By assumption, x belongs to the set G. Since 
y is a good column, 

IB' I 9*1+2 
| G | < 2 ^ = f_. 

The set G can be enumerated given y, t\, m — t and the table T. Thus, given y, 
x can be described by its rank in the enumeration of G and by the information 
needed for the enumeration. This information is given as follows. We give the 
constant C and the rank of x written on exactly t\ + 2 — t. Note that from y, 
whose length is n, we can calculate the table T and m and t. Thus, from the 
given information, we can reconstruct t\. Therefore, 

C(x | y) < t 1 + 2-£ + logC + 21oglogC + 0(l) 
< h -< + logC + 21oglogC + 0(l), 

where the constant in 0(1) does not depend on C. On the other hand, since x 
and y are at most a(n)-dependent, 

C(x | y) > h-a{n). 

Combining the last two inequalities, it follows that t < a(n)+log C+2 log log C+ 
0(1), which contradicts that t — a(n) + C. (for an appropriate choice of C) | 



5 Impossibility of independence amplification 

The dependence of strings x and y is given by dep(a;, y) = C(x) + C(y) — 
C{xy). The smaller dep(a;, y) is, the more independent the strings x and y are. 
Thus, amplifying independence amounts to reducing dependence. An effective 
dependence reducer would consist of two computable functions f± and fi that 
for two functions a(n) > /3(n) guarantee that for all x,y of length n, 

dep(s,y) < a(n) dep(fi(x,y)J 2 {x,y)) < /3(n). (1) 

Note that, since dep(w, v) < (3(n) whenever C(u) < f3(n) or C(v) < (3(n), 
dependency reduction would be achieved by two functions that simply output 
strings with Kolmogorov complexity < j3{n). To avoid this trivial and non- 
interesting type of dependency reduction, we require that, in addition to re- 
quierement (TTJ), C(fi(x,y)) h (i{n) and C{fa{x,y)) >z fi{n). More precisely, 
we seek two computable functions f\ : {0,1}" x {0,1}" — > {0, l}H n ) and 
f 2 : {0,1}" x {0,1}" -> {0,1}'W that satisfy the following DEPENDENCY 
REDUCTION TASK. 
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DEPENDENCY REDUCTION TASK for parameters a(n),/3(n), s(n), 
l{n), and a. 

For all x e {0, 1}™, y e {0, 1}" with dep(x,y) < a(n), C{x) > s(n) and 
G{y) > s(n) the following should hold: 

1. dep(f 1 (x,y),f 2 (x,y)) </3(n), 

2. C(fi(x,y)) > P{ri) + a- logn and C(f 2 (x,y)) > /3(n) + a- \ogn. 

We show that effective independence amplification is essentially impossible. 

Theorem 5. Let a(n) be a function such that a(n) < n/2 — 5 logn and let 
/3(n) — a(n) — logrt — 31oga(n). Let s(n) be a function such that s(n) < n — 
a(n) — 2 log n — 0(1) and let l(n) be a function such that l(n) > /3(n) + 81ogn. 

There are no computable functions fi : {0,1}™ x {0,1}" — > {0,1}^™^ and 
f 2 : {0, 1}" x {0, 1}" -> {0, 1} Z (") satisfying the DEPENDENCY REDUCTION 
TASK for parameters a(n), fi{n), s{n), l(n) and a = 8. 

Proof. Suppose there exist two computable functions f\ and f% satisfying 
the DEPENDENCY REDUCTION TASK for the given parameters and let 
f(x,y) = E{f 1 (x,y),f 2 (x,y)), where E : {0,1}^) x {0,1}^) -> {0, l} m ^ 
is the Kolmogorov extractor from Theorem |4] for parameters ms(n) = a(n), 
se(ji) = (3(n) +81ogn and dependency = /3(n). Theorem [2] promises 

two strings x and y in {0, 1}™ such that C(x \ y) > s(n), C(y \ x) > s(n) and 
C(f(x,y)) < m E (n)-a(n) + logn + 2\oga(n) + 0(l) = \ogn + 2\oga(n) + 0(l). 
Note that dep(x,y) < a(n). 

Let u = fi{x,y),v ~ fi{x,y). The assumption implies that C(u) > se(ji), 
C{v) > SE(n) and dep(u,v) < aE(n). The extractor E guarantees that 
C(E(u,v)) > m(n)-a E (n)-0(l) = a(n) - (a(n) - logn - 3 loga(n)) - 0(1) = 
31oga(n) + logn — O(l)- Since E(u,v) — f(x,y), this is in conflict with the 
previous inequality. I 
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